San Marcos
ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity
Mimi, Mahmuda Sultana, Islam, Md Monzurul, Tusti, Anannya Ghosh, Somvanshi, Shriyank, Das, Subasish
Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.
- North America > United States > California (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Infrastructure & Services (0.94)
- Information Technology (0.93)
A Multicollinearity-Aware Signal-Processing Framework for Cross-$β$ Identification via X-ray Scattering of Alzheimer's Tissue
Bashit, Abdullah Al, Nepal, Prakash, Makowski, Lee
X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross-$β$ inclusions, yet systematic exploitation of these data for automated detection remains challenging due to substrate contamination, strong inter-feature correlations, and limited sample sizes. This work develops a three-stage classification framework for identifying cross-$β$ structural inclusions-a hallmark of Alzheimer's disease-in X-ray scattering profiles of post-mortem human brain. Stage 1 employs a Bayes-optimal classifier to separate mica substrate from tissue regions on the basis of their distinct scattering signatures. Stage 2 introduces a multicollinearityaware, class-conditional correlation pruning scheme with formal guarantees on the induced Bayes risk and approximation error, thereby reducing redundancy while retaining class-discriminative information. Stage 3 trains a compact neural network on the pruned feature set to detect the presence or absence of cross-$β$ fibrillar ordering. The top-performing model, optimized with a composite loss combining Focal and Dice objectives, attains a test F1-score of 84.30% using 11 of 211 candidate features and 174 trainable parameters. The overall framework yields an interpretable, theory-grounded strategy for data-limited classification problems involving correlated, high-dimensional experimental measurements, exemplified here by X-ray scattering profiles of neurodegenerative tissue.
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Asia > Nepal (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- (6 more...)
FairAD: Computationally Efficient Fair Graph Clustering via Algebraic Distance
Vuong, Minh Phu, Lee, Young-Ju, Ojeda-Ruiz, Iván, Lee, Chul-Ho
Due to the growing concern about unsavory behaviors of machine learning models toward certain demographic groups, the notion of 'fairness' has recently drawn much attention from the community, thereby motivating the study of fairness in graph clustering. Fair graph clustering aims to partition the set of nodes in a graph into $k$ disjoint clusters such that the proportion of each protected group within each cluster is consistent with the proportion of that group in the entire dataset. It is, however, computationally challenging to incorporate fairness constraints into existing graph clustering algorithms, particularly for large graphs. To address this problem, we propose FairAD, a computationally efficient fair graph clustering method. It first constructs a new affinity matrix based on the notion of algebraic distance such that fairness constraints are imposed. A graph coarsening process is then performed on this affinity matrix to find representative nodes that correspond to $k$ clusters. Finally, a constrained minimization problem is solved to obtain the solution of fair clustering. Experiment results on the modified stochastic block model and six public datasets show that FairAD can achieve fair clustering while being up to 40 times faster compared to state-of-the-art fair graph clustering algorithms.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (3 more...)
Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints
Ojeda-Ruiz, Iván, Ju-Lee, Young, Dickens, Malcolm, Cambisaca, Leonardo
Recent research has focused on mitigating algorithmic bias in clustering by incorporating fairness constraints into algorithmic design. Notions such as disparate impact, community cohesion, and cost per population have been implemented to enforce equitable outcomes. Among these, group fairness (balance) ensures that each protected group is proportionally represented within every cluster. However, incorporating balance as a metric of fairness into spectral clustering algorithms has led to computational times that can be improved. This study aims to enhance the efficiency of spectral clustering algorithms by reformulating the constrained optimization problem using a new formulation derived from the Lagrangian method and the Sherman-Morrison-Woodbury (SMW) identity, resulting in the Fair-SMW algorithm. Fair-SMW employs three alternatives to the Laplacian matrix with different spectral gaps to generate multiple variations of Fair-SMW, achieving clustering solutions with comparable balance to existing algorithms while offering improved runtime performance. We present the results of Fair-SMW, evaluated using the Stochastic Block Model (SBM) to measure both runtime efficiency and balance across real-world network datasets, including LastFM, FacebookNet, Deezer, and German. We achieve an improvement in computation time that is twice as fast as the state-of-the-art, and also flexible enough to achieve twice as much balance.
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces
Lahiry, Ankur, Pokharel, Ayush, Banday, Banooqa, Ockerman, Seth, Gueroudji, Amal, Zaeed, Mohammad, Islam, Tanzima Z., Pouchard, Line
Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance analysis both computationally expensive and time-consuming. To address this challenge, we present an end-to-end parallel performance analysis framework designed to handle multiple large-scale GPU traces efficiently. Our proposed framework partitions and processes trace data concurrently and employs causal graph methods and parallel coordinating chart to expose performance variability and dependencies across execution flows. Experimental results demonstrate a 67% improvement in terms of scalability, highlighting the effectiveness of our pipeline for analyzing multiple traces independently.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (2 more...)
- Energy (0.69)
- Government > Regional Government (0.47)
- Information Technology > Scientific Computing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Hardware (0.98)
- Information Technology > Graphics (0.88)
HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling
Maiterth, Matthias, Brewer, Wesley H., Kuruvella, Jaya S., Dey, Arunavo, Islam, Tanzima Z., Menear, Kevin, Duplyakin, Dmitry, Kabir, Rashadul, Patki, Tapasya, Jones, Terry, Wang, Feiyi
Schedulers are critical for optimal resource utilization in high-performance computing. Traditional methods to evaluate schedulers are limited to post-deployment analysis, or simulators, which do not model associated infrastructure. In this work, we present the first-of-its-kind integration of scheduling and digital twins in HPC. This enables what-if studies to understand the impact of parameter configurations and scheduling decisions on the physical assets, even before deployment, or regarching changes not easily realizable in production. We (1) provide the first digital twin framework extended with scheduling capabilities, (2) integrate various top-tier HPC systems given their publicly available datasets, (3) implement extensions to integrate external scheduling simulators. Finally, we show how to (4) implement and evaluate incentive structures, as-well-as (5) evaluate machine learning based scheduling, in such novel digital-twin based meta-framework to prototype scheduling. Our work enables what-if scenarios of HPC systems to evaluate sustainability, and the impact on the simulated system.
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.50)
- North America > United States > Florida > Orange County > Orlando (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- (16 more...)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.64)
DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
Irani, Habib, Metsis, Vangelis
A fundamental component enabling transformers to process sequential data is positional encoding, which addresses the inherent permutation invariance of self-attention mechanisms by injecting positional information into input representations. In time series analysis, the importance of positional encoding is amplified due to the intrinsic temporal dependencies and complex multi-scale patterns characteristic of temporal data [2, 3]. However, existing positional encoding methods, ranging from sinusoidal encodings [1] to sophisticated relative positioning schemes [4, 5], share a fundamental limitation: they are signal-agnostic. These methods derive positional information exclusively from abstract sequence indices (0, 1, ..., L-1) while remaining completely oblivious to the underlying signal characteristics. For instance, consider two time series segments occurring at identical absolute positions but exhibiting vastly different temporal dynamics: one representing a quiet, stable period with minimal variation, and another capturing volatile, high-frequency oscillations.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
A Dimensionality-Reduced XAI Framework for Roundabout Crash Severity Insights
Chakraborty, Rohit, Das, Subasish
Roundabouts reduce severe crashes, yet risk patterns vary by conditions. This study analyzes 2017-2021 Ohio roundabout crashes using a two-step, explainable workflow. Cluster Correspondence Analysis (CCA) identifies co-occurring factors and yields four crash patterns. A tree-based severity model is then interpreted with SHAP to quantify drivers of injury within and across patterns. Results show higher severity when darkness, wet surfaces, and higher posted speeds coincide with fixed-object or angle events, and lower severity in clear, low-speed settings. Pattern-specific explanations highlight mechanisms at entries (fail-to-yield, gap acceptance), within multi-lane circulation (improper maneuvers), and during slow-downs (rear-end). The workflow links pattern discovery with case-level explanations, supporting site screening, countermeasure selection, and audit-ready reporting. The contribution to Information Systems is a practical template for usable XAI in public safety analytics.
- North America > United States > Ohio (0.25)
- North America > United States > Michigan (0.05)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > Louisiana (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm
Chhetri, Gaurab, Anderson, Darrell, Kutela, Boniphace, Das, Subasish
This study presents the first multi-platform sentiment analysis of public opinion on the 15-minute city concept across Twitter, Reddit, and news media. Using compressed transformer models and Llama-3-8B for annotation, we classify sentiment across heterogeneous text domains. Our pipeline handles long-form and short-form text, supports consistent annotation, and enables reproducible evaluation. We benchmark five models (DistilRoBERTa, DistilBERT, MiniLM, ELECTRA, TinyBERT) using stratified 5-fold cross-validation, reporting F1-score, AUC, and training time. DistilRoBERTa achieved the highest F1 (0.8292), TinyBERT the best efficiency, and MiniLM the best cross-platform consistency. Results show News data yields inflated performance due to class imbalance, Reddit suffers from summarization loss, and Twitter offers moderate challenge. Compressed models perform competitively, challenging assumptions that larger models are necessary. We identify platform-specific trade-offs and propose directions for scalable, real-world sentiment classification in urban planning discourse.
- South America > Colombia (0.04)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (8 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media
Chhetri, Gaurab, Dutta, Anandi, Das, Subasish
The emergence of decentralized social media platforms presents new opportunities and challenges for real-time analysis of public discourse. This study introduces CognitiveSky, an open-source and scalable framework designed for sentiment, emotion, and narrative analysis on Bluesky, a federated Twitter or X.com alternative. By ingesting data through Bluesky's Application Programming Interface (API), CognitiveSky applies transformer-based models to annotate large-scale user-generated content and produces structured and analyzable outputs. These summaries drive a dynamic dashboard that visualizes evolving patterns in emotion, activity, and conversation topics. Built entirely on free-tier infrastructure, CognitiveSky achieves both low operational cost and high accessibility. While demonstrated here for monitoring mental health discourse, its modular design enables applications across domains such as disinformation detection, crisis response, and civic sentiment analysis. By bridging large language models with decentralized networks, CognitiveSky offers a transparent, extensible tool for computational social science in an era of shifting digital ecosystems.
- North America > United States > Texas > Hays County > San Marcos (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > India (0.04)
- Research Report (0.50)
- Overview (0.34)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.92)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.49)